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SIGNIFICANCE  AND  EXPLANATION 


Forcible  entry  has  been  made  to  a  building,  by  breaking  a  window,  and  a 
crime  committed*  A  suspect  is  later  found  to  have  fragments  of  window  glass 
adhering  to  his  clothing*  Measurements  of  the  refractive  indices  of  the  glass 
at  the  scene  of  the  crime  and  on  the  suspect's  clothing  are  made  and  found  to 
be  similar.  What  evidence  is  there  that  the  clothing  glass  came  from  the 
window? 

The  topic  has  been  much  discussed  in  the  forensic  science  literature.  I 
(Biometrika,  64,  207-213  (1977))  gave  a  solution.  This  has  been  criticized  by 
Shafer  in  a  paper  to  appear  in  the  J.  Amer.  Statist.  Assoc.  (1981).  This 
report  is  a  reply  to  Shafer  prepared  at  the  request  of  the  editor. 

The  problem  is  of  general  importance  in  addressing  two  fundamental 
issues:  how  should  we  measure  the  strength  of  an  apparent  coincidence 

(between  the  two  types  of  glass)  and  no  unusual  event  should  be  considered 
without  reference  to  alternatives  (what  is  the  usual  value  for  the  refractive 
index  of  window  glass?). 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


REPLY  TO  SHAFER:  LINDLEY'S  PARADOX 


D.  V.  Lindley 

The  supporter  of  a  theory  should  welcome  good  criticism:  and  I  know  of 
no  better  critic  of  the  Bayesian  viewpoint  than  Shafer-  If  the  theory 
survives  the  criticism,  then  it  is  enhanced  the  more  the  better  the  critique. 
In  my  view,  Bayesian  ideas  come  out  of  Shafer's  analysis  rather  well. 

1*  Reliability  of  evidence.  It  is  not  always  recognized  that  only  the 
relevant  probability  matters:  whether  that  probability  is  based  on  strong  or 
weak  evidence  is  immaterial.  Shafer  is  wrong  when  he  says  "he  ought  also  to 
weigh  the  reliability  of  the  evidence".  Consider  the  following  example.  An 
urn  contains  a  large  number  of  balls  each  of  which  is  coloured  either  red  or 
black,  one  of  them  is  to  be  drawn  at  random  and  a  prize  awarded  if  the  ball  is 
red.  Contrast  two  situations.  In  the  first  the  proportion  of  red  balls  is 
known  to  be  V2  •  In  the  second  the  proportion  p  is  unknown  but  is  described 
by  a  probability  density  f(p)  with  mean  V2  •  As  far  as  the  prize  is 
concerned  the  relevant  probability  is  that  of  a  red  ball  being  drawn,  which  is 
V2  in  both  situations.  The  fact  that  the  knowledge  of  p  is  less  reliable  in 
the  second  case  is  irrelevant.  Tversky  (1974)  reports  that  in  a  choice 
between  the  two  situations  subjects  incoherently  prefer  the  first.  Shafer 
appears  to  share  their  view  when  he  discounts  the  histogram  evidence,  for  only 
the  probability  of  guilt  is  relevant. 

The  reason  for  the  confusion  is  that  the  irrelevant  aspects  can  become 
relevant  if  the  problem  is  changed  and  a  different  probability  required.  To 
see  this  modify  the  examples  to  where  two  balls  are  to  be  drawn  and  the  prize 
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awarded  if  they  are  of  the  same  colour.  The  relevant  probability  for  a  given 

o  o  0  0 

p  is  p  +  (1-p)  .  This  is  V2  in  the  first  case  but  J[p  +  (1-p)  ]f(p)dp 

2  2 

in  the  second.  This  is  easily  evaluated  to  give  V2  +  2 a  where  0  is  the 
variance  of  p.  Now  the  situations  are  distinguishable.  There  are  similarly 
aspects  of  the  histogram  evidence  that  would  be  relevant  for  some  questions, 
but  for  the  question  of  guilt  the  strength  of  that  evidence  does  not  matter 
any  more  than  did  that  about  p  in  the  example. 

2.  Behavioural  assessment.  In  discounting  the  histogram  evidence, 

Shafer  uses  a  rate  a.  What  does  this  number  mean?  He  argues  that  a 
behavioural  interpretation  is  not  necessary  but  other  than  by  behaviour  how 
can  we  understand  ot?  Bayesian  arguments  are  firmly  based  on  behaviour. 

Shafer  claims  that  "Bayesian  theory  uses  canonical  examples  where  the  truth  is 
generated  according  to  known  chances".  This  is  a  possibility,  but  not  the 
only  one.  Thus  Ramsey's  (1931)  canonical  form  is  "an  ethically  neutral 
proposition  of  degree  of  belief  V2  "•  An  event  is  ethically  neutral  if  you  do 
not  mind  whether  it  is  true  or  false.  If  has  degree  of  belief  V2  if  you  are 
indifferent  between  receiving  a  prize  contingent  on  the  truth,  or  on  the 
falsehood  of  the  event.  No  chance  element  enters  here.  Or  take  de  Finetti's 
(1974)  scoring  rule  in  which  belief  a  for  an  event  A  attracts  a  penalty 
score  (A-a)  ,  where  A  also  denotes  the  indicator  function  of  A.  Here  no 
canonical  form  is  used.  (Incidently  this  method  is  available  for  any  scoring 
rule  and  not  just  the  quadratic.)  If  such  a  rule  is  applied  to  belief 
functions,  in  which  A  and  1  -  A  may  have  beliefs  that  add  to  less  than 
one,  then  these  beliefs  will  never  attract  a  smaller  score  than  those  based  on 
probabilities:  Lindley  (1981).  Shafer's  procedure  is  inadmissible  for  any 


scoring  rule. 


3.  Comparisons  of  small  probabilities*  Shafer  rightly  points  out  that 
in  the  forensic  case  the  Bayesian  method  compares  two  probabilities  (of  the 
data  on  the  null  and  alternative  hypotheses)  both  of  which  are  typically 
small,  and  he  suggests  this  is  unsatisfactory.  This  is  not  so:  the 
comparison  of  small  probabilities  is  the  usual  situation  because  most  things 
that  happen  to  us  have  low  probability;  we  go  through  life  experiencing  rare 
events.  You  are  giving  a  lecture  and  collect  a  list  of  the  students'  names. 
Afterwards  you  look  at  the  list  and  see  that  the  probability  of  those  names  is 

very  low.  If  there  were  only  10  possible  names  and  8  students,  it  is 
—8 

10  (and  this  includes  the  case  where  all  the  names  are  the  same).  We  pass 
the  coincidence  by  unless  we  can  think  of  another  hypothesis  that  increases 
the  small  probability  substantially.  It  is  a  basic,  important  principle  of 
life  that  we  should  only  judge  things  in  comparison  with  other  things.  Neyman 
and  Pearson  taught  us  this  in  statistics:  compare  p(x|H)  with  p(x|H').  In 
the  forensic  case  any  measurement  on  the  suspect  has  low  probability  -  indeed, 
in  the  ultimate,  perfectly  accurate,  mathematical  fiction,  it  has  probability 
zero.  It  is  therefore  appropriate  that  two  low  values  should  be  compared. 

4.  The  soundness  of  legal  arguments.  There  is  a  tacit  assumption  in 
some  philosophical  and  statistical  writing  about  legal  matters  that  the  law  is 
right.  Cohen  (1977)  makes  this  rather  explicit  in  his  book.  And  Shafer  seems 
to  support  the  view  when  he  argues  that  defense  counsel  would  attack  the 
Bayesian  arqument  on  the  grounds  discussed  in  my  section  3.  To  this  my  answer 
is  that  we  should  not  accept  legal  arguments  uncritically  but  compare  them 
with  those  suggested  by  the  coherent  approach  to  see  the  merits  of  each.  When 
we  do  this  we  see  that  the  essentially  destructive  nature  of  arguments  used  by 
counsel  is  unsatisfactory  because  it  does  not  involve  consideration  of 
alternatives.  Finkels^ein  (1978)  makes  the  sensible  suggestion  that  a  defense 


counsel  should  be  required  to  produce  alternative,  positive  proposals  for  the 
prosecution  to  criticize*  Any  competent  lawyer  could  destroy  any  scientific 
theory. 

On  another  legal  matter  Shafer  suggests  that  weighing  of  evidence  is  not 
allowed  by  witnesses.  Here  the  legal  and  Bayesian  arguments  do  not  conflict 
My  suggestion  to  the  forensic  scientist  is  that  he  should  give  the 
probabilities  of  the  data  (evidence)  both  on  the  supposition  of  guilt  and  on 
that  of  innocence.  The  jury  can  then  process  these  values  by  taking  their 
ratio  and  multiplying  by  the  odds  without  the  forensic  evidence?  thereby 
performing  the  weighting.  In  general,  it  is  the  task  of  the  witness  to 
provide  all  or  part  of  the  likelihood.  The  expert  should  not  do  what  Shafer 
suggests  and  testify  that  there  are  ’'very  great  odds  for  the  hypothesis"  since 
he  has  no  right  to  speak  to  the  prior  probability.  This  was  the  basic  mistake 
made  in  the  Collins  case  where  the  likelihood  ratio  was  fairly  sound  and 
large,  but  the  final  odds  were  ~>nly  modest  because  the  prior  odds  were  so 
small . 

5.  "Lumpiness" .  Before  tackling  the  specific  issue  let  me  make  a 
general  point.  Probability  is  a  function  of  two  arguments:  the  event  beina 
assessed,  A,  and  the  conditions  under  which  the  assessment  is  being  made, 

H.  We  write  p(A|H).  Probability  is  often  taught  as  if  it  were  part  of 
measure  theory.  This  ignores  much  of  the  beauty  and  importance  of  the 
subject,  for  it  is  a  measure  only  as  a  function  of  A  -  not  as  a  function  of 
H.  The  relevance  of  this  general  remark  here  is  that  my  paper  had  a  specific 
H,  namely  the  histogram  of  Figure  B.  As  other  evidence  is  accumulated,  H 
changes  and  so  may  the  probability.  Shafer,  just  as  a  lawyer  would,  brings  in 
additional  evidence;  and  it  is  right  that  they  should  do  so*  For  example  the 


other  histograms  tell  us  more  about  window  glass*  But  it  is  unreasonable  to 


criticize  p(A|H)  because  it  is  not  p(A|H')«  In  science,  and  in  law,  we 
should  include  all  in  H  that  we  reasonably  and  economically  can. 

Now  for  the  lumpiness.  Let  me  make  the  assumption  that  the  measurements 
that  lead  to  the  histograms  are  made  with  the  same  precision  as  those  in  the 
trial  evidence.  Then  the  quantity  required  is 

*1  ,oo  f  V  0)^ 

Li =  im  exp[~  “S— ]V0)d0 

2  a 

where  7^(0),  or  more  correctly,  tt^(0|D],  is  the  probability  of  9  given 
the  histogram  evidence  D.  This  is  equal  to  p(y I y ^ ,y2f • • /Vn)  where 
D  *  (yvy2/  •  •  .yn)  3nd  all  the  y's  are  judged  exchangeable.  So  all  we  are 
saying  is  that  y  is  just  like  the  fire  data  { y The  question  therefore 
reduces  essentially  to  evaluating  the  density  function  of  the  y's  and  the 
statistical  literature  is  rich  in  useful  methods.  (I  did  a  rather  "sloppy" 
job  here  because  my  concern  in  the  paper  was  to  emphasize  other  points.)  All 
these  methods  use  smoothing  and  the  better  ones  estimate  the  smoothing  hyper¬ 
parameter.  If  there  is  additional  evidence  about  the  smoothing  then  this 
could  be  incorporated  into  the  prior.  Actually  it  is  clear  that  is  not 

much  affected  by  lumpiness.  For  example,  will  scarcely  be  altered  if  30 

values  are  all  at  y  or  30  values  are  spread  over  y  ±  a,  for  is  a 

smoothed  version  of  7^(0),  smoothed  by  the  error  in  y. 

There  is  a  point  where  the  lumpiness  does  matter.  In  my  paper  the 
assumption  was  made  that  if  the  glass  on  the  suspect's  clothing  did  truly 
match  9g  then  he  was  guilty.  But  if  there  are  lumps,  this  may  not  be  so  for 


there  may  be  several  windows  with  index  0Q  and  all  that  the  evidence  could 
show  is  that  the  glass  came  from  one  of  these,  not  necessarily  from  the  window 
at  the  scene  of  the  crime, 

6.  Miscellaneous  comments,  Seheult  (1978)  and  Grove  (1980)  have  both 
commented  on  my  paper  and  their  criticism  is  worth  studying  although  neither 
make  reference  to  the  fact  that  their  proposals  are  incoherent. 

It  was  assumed  in  that  paper  that  the  glass  was  window  and  not,  for 
example,  bottle  glass.  My  understanding  was  that  it  was  possible  to 
distinguish  between  the  various  broad  types  of  glass. 

A  problem  that  does  need  analysis  is  that  suggested  by  Shafer  in  his 
second  comment  in  5.3  when  more  than  one  piece  of  window  glass  is  found  on  the 
suspect.  There  are  several  possibilities:  none  of  the  glass  came  from  the 
broken  window,  only  one  piece  did,  two  pieces  did,  and  so  on.  It  becomes  a 
little  messy  to  compare  all  the  possibilities . 

Is  Shafer  correct  when  he  refers  to  the  precision  of  an  average?  Is  he 
not  confusing  precision  with  accuracy?  Precision  may  be  measured  by  the 
inverse  of  the  variance:  accuracy  by  the  inverse  of  the  mean-square  error. 
Because  scientific  measurements  typically  contain  unknown  and  undetected 
biases,  precision  can  increase  without  limit  but  not  accuracy.  Statisticians 
with  their  emphasis  on  standard  errors  that  ignore  the  bias  have  confused  the 
issue  in  some  scientific  experimentation  because  the  error  they  quote  is 
substantially  less  than  the  true  error. 

There  is  one  unsatisfactory  feature  of  the  Bayesian  analysis  that  Shafer 
does  not  mention.  It  is  sensitive  to  the  error  distribution.  For  example,  if 
(Y-G)/a  has  a  t-distribut ion  on  5  degrees  of  freedom,  then  at  Y-9  =  2a 
the  likelihood  is  0.171  times  its  value  at  Y  =  9,  compared  with  0.135 


for  the  normal:  at  4 o  the  values  are  1.35  x  10-2  and  3.35  x  10~4, 
respectively.  We  need  more  information  about  the  tails  of  the  error 
distribution. 

There  is  room  for  improvement  in  the  details  of  the  Bayesian  analysis  of 
forensic  data  but  the  basic  principles  seem  untouched  by  the  criticism  offered 
in  the  paper. 
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